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Tliis arlictc develops new theory and 
methodology for the forecasting of 
extreme and/or record values in an ex- 
changeable .sequence of riindom vari- 
abtcs. The Hill tail index estimator for 
long-tailed distributions is modified so 
as to be appropriate for prediction of 
future variiibles. Some basic isstjcs re- 
garding the use of finite, versus infinite 
idealized models, arc discussed. It is 
shown thai the standard idealized long- 
tailed model with tail index «^2 can 
lead to unrealistic predictions if the 
observable data is assumed to he un- 
bounded. However, if the model is 
in.^icad viewed as valid only for .some 
appropriate finite domain, then it is 
compatible with, and leyds to sharper 
versions of, sensible methods for pre- 
diction. In particular, the prediction of 



the next reeurd value is then at most a 
few multiples of the current record. H 
is argued thai there is no more reason 
to eschew posterior expectations for 
forecasting in the context of long-tailed 
distributions than to do so in any other 
Context, such as in the many applica- 
tions where expectations are routinely 
used fur scientific inference and deci- 
sion -making. Computer simulations are 
tLScd to demonstrate the effectiveness 
of the methodology, and its use in fore- 
casting is illustrated. 
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1. Introduction 



Consider a sequence Xu...^„ of positive random 
variables that is exchangeable. We say that A'„ + i is 
a (new) record value if A'„+i>A^, for / = !,.. .,«. See 
[2] for some related discussion of record values in 
the iid case. The problem that we address concerns 
forecasting of the next observation, A'„+i, given that 
it is a record value, conditional upon the data 
Xi=Xi, for / = ],...,;;. In other words, given that 
Xni-i sets a new record, how large will it be? 

In the Bayesian approach, with squared error 
loss, the forecast oiXu*u conditional upon the data 
Xi,...^m, and upon Xn+i>max[Xu—^n], is simply 
the posterior expectation of A'„ + i conditional upon 



the same information. Note that if a sequence is 
exchangeable, then the future variables are also 
conditionally exchangeable, given the realization of 
the first n variables. Hence each of the next jV ob- 
servations has in fact the same posterior predictive 
distribution. The posterior expectation for X„+j, 
conditional upon ^o+j being larger than each of the 
first n observations, is then the same for each/ > 1. 
It may be noted that there are two quite different 
questions that arise concerning the forecasting of 
future record values. The first concerns the fore- 
casting of when the next record value will occur, 
while the second concerns the forecasting of the 
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magnittdde of the next record value. In this article 
we only consider the second question.' 

Although we focus attention here only on the 
prediction of the magnitude of j!f„+i given that it 
sets a new record, there is a relatively straight-for- 
ward extension of these results to the evaluation of 
the posterior expectation of A'„+j, given that it sets 
a new record. To obtain the prediction of the next 
record value, conditional upon the data jci,...;c„, 
and upon X„+j being the next new record value, we 
must evaluate the posterior expectation of X,,*,, 
conditional upon the collection of inequalities that 
define the event that X„+i is the next record value. 
This can be done by a generalization of the proce- 
dure for forecasting X„+u conditional upon its be- 
ing the next record value. For example, the 
posterior expectation of A^^+z, conditional upon its 
setting a new record, can be obtained by condition- 
ing upon the event that X„-n sets a new record, and 
then making the same type of evaluation as above 
for Xntu given that it is a record value; or alterna- 
tively, by conditioning upon the event that 
X„+i<max[X\,...,Xn]^ and then evaluating the pos- 
terior expectation of X„t2, given that it is larger 
than max[Xu...JC„]. Since in the Bayesian frame- 
work with a specified a priori distribution, the pos- 
terior probability that X„+i sets a new record is 
known, there is no difficulty in principle in extend- 
ing the analysis for the posterior expectation of 
Xn+i, given that it sets a new record, to the fore- 
casting of the magnitude of future record values. 
Explicit algorithms for doing so will appear in a 
later paper. 

Although the present paper deals only with (he 
evaluation of the posterior expectation of Xn + i, 
given that it sets a record, we shall nonetheless 
sometimes speak of forecasting the magnitude of 
future record values, since this can be achieved by 
the same basic methods. Similarly, one can obtain 
the posterior expectation of the maximum over 
some finite horizon, say the maximum of 
X„+i,...J(n+i{, given that this maximum exceeds our 
current record value. This is a problem of consider- 
able practical importance both in economic fore- 
casting of interest rates, and in engineering design, 
where for example, one desires to build a structure 
capable or withstanding severe winds or earth- 
quake tremors over a certain period of time. To the 



' For ihosc unfamiliar wjih cxchangeabiliiy, il may be remarked 
(hai exchangeable sequences are strictly stationaiy proces-<;cs, 
and can be strongly dependent. An interesting and important 
class of exchangeable processes consists of the Markov-Polya 
processes, discussed in [3,4,5,6], which pkiy a major role in the 
theory of stochastic chDOS. 



best of my knowledge such forecasting has never 
been attempted before in the sense of providing a 
procedure that could be recommended for serious 
consideration in real-world problems. 

If we assume a conventional statistical model 
with some unknown parameter 0, then in principle 
these are straight-forward Bayesian problems, 
since one can integrate out unknown parameters 
with respect to their posterior distribution, to ob- 
tain the predictive distribution for a new observa- 
tion; and then condition also upon such a new 
observation being a record value, in order to an- 
swer the question. For example, one could obtain 
the posterior expectation and variance for X„+), 
given that it is a record value. However in typical 
real-world problems involving forecasting of such 
extreme values, the model is always uncertain and 
oflen unreliable. This is especially so in the tails of 
the distribution, where there is little, if any, past 
data to rely upon. Thus to obtain reliable forecasts 
requires serious attention to model uncertainly. 
See Hill [7] for discussion of the selection of mod- 
els from a Bayesian viewpoint, Poirier [8] for a 
Bayesian analysis of some theoretical models in 
economics, and Singpurwalla and Meinhold [9] for 
Bayesian robustification theory in a closely related 
area. 

In this paper we attempt to deal with the prob- 
lem by using the formulation for inference about 
the tails of the dislribution initiated in [I]. See [10] 
for an exposition, and Csorgo et al. [11] for related 
asymptotic theory. This approach utilizes only the 
upper order statistics of the past data for inference 
about the upper tail, since it is only such order 
statistics that fall in the upper tail where the form 
of the distribution is assumed known. Seriously to 
utilize the information in the other order statistics 
requires knowledge concerning the global form of 
the distribution, and such knowledge is often un- 
available. Suppose that given the parameter a, the 
upper tail of a distribution F on the positive real 
line is of algebraic form, with tail index a. We as- 
sume that 

l-F{t) = P(X>t\a)-'Cxr", 

for C>Q,a>0, and ( in some interval (Aji) that 
is considered relevant for prediction of future ob- 
servations. It is supposed that a random sample 
Xi=x,, for / = !,... /I, from the distribution is 
available, and ba.sed upon this data we wish to 
forecast the next observation X.+i. Such prediction 
in the Bayesian context amounts to putting forth a 
posterior dislribution for X + i, that is obtained by 
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integrating out unknown parameters such as a, 
with respect to their posterior distribution, and 
then making appropriate forecasts by minimizing 
posterior expected loss with respect to some loss 
function. In this article we consider only squared 
error loss, but our methods can be used in connec- 
tion with any loss function believed appropriate. 
See Aitchision and Dunsmore [12] and Maret [13] 
for the Bayesian theory and methodology of such 
predictive distributions. 

Often a simple summary of the posterior predic- 
tive distribution, such as the posterior expectation 
and variance of X+i, suffices for many practical 
purposes. In typical applications A will be the 
largest order statistic of the past data, k can some- 
times be -S- w, but for reasons discussed below will 
often instead be some modest multiple of A . We 
might be interested, for example, in forecasting the 
next observation, X„^.], conditional upon its being 
between x*'^ and 5 xjr<'\ where x^) is the largest 
order statistic of the past data. Forecasting of such 
a record value is an especially difficult part of the 
overall forecasting problem, since by assumption 
there is no past data of this magnitude. Yet in fore- 
casting extreme values, it is necessary to consider 
precisely the situation in which the ob.servation is 
more extreme than anything yet seen. For example, 
in designing a structure to resist high winds, one 
must make allowance for forces more extreme than 
have yet been experienced. It would be foolish to 
imagine that such forces have already been ob- 
served at their maximum. 

The best that one can do in such circumstances 
is to use what relevant theory exists, making sure 
that such theory is compatible with the data that 
has been seen. In this article we shall rely on the 
theory of long-tailed distributions, in which the tail 
is known to be of algebraic form at least in some 
interval. Many data sets are known to be of this 
form. Examples include income distributions, city 
size distributions, distributions of genera by spe- 
cies, insurance claim sizes, word frequency distri- 
butions, stock market fluctuations, and many 
others. See Zipf [14] for graphical presentation of a 
great variety of data in support of his theory for 
long-tailed distributions. Several theoretical mod- 
els have been proposed for such data. These in- 
clude the probability models of Yule [15], Hill 
[16,17,18], Hill and Woodroofe [19,20], and Hill, 
Lane and Sudderth [3,4]. See Johnson and Kotz 
[21] for discussion of the model of Hill [22,17], 
which was the starting point for the later models. 
As pointed out by Chatterjee and Yilmaz [23], 
some of these models are related to stochastic 
models for chaos. 



We are particularly interested in the case where 
a is not large, so we are dealing with a truly long- 
tailed distribution. For any a >0 the distribution of 
X„+i is proper, even when k = ^. However, for 
fixed known a^ 1 the expectation of A'n+i is infinite 
if there is no finite upper bound for the data, and 
the variance of X„+\ is infinite if ar«2. Also, if 
a^l is unknown, which is ordinarily the case, the 
posterior distribution for a must give sufficiently 
small weight to values of a near 1, in order for the 
posterior expectation oiX„^.l to be finite. This gives 
rise to an important practical i.ssue for Bayesians, 
since the predictions are then very sensitive to the 
precise form of the a priori distribution for or near 
1, and the results are not robust. Similarly, if a 2=2 
is unknown, the posterior distribution for a must 
give sufficiently small weight to values of a near 2, 
in order for the posterior variance of A'„+i to be 
finite.^ 

In view of such non robustness, it is necessary to 
proceed more carefully than in most problems of 
statistical inference and prediction. Our method is 
to take explicit account of the boundedness of the 
observations. In many real world applications of ex- 
treme value theory, where one deals with maximal 
temperatures, wind velocities, rain fall, etc., the 
data are generally considered to be bounded. For 
example, a wind velocity even double the highest 
ever previously experienced, must be regarded as 
extremely improbable. Even if such could occur, it 
might be regarded as indicating a basic change in 
climate such as would invalidate all standard as- 
sumptions, and so require modification of existing 
theory. This suggests that a realistic analysis of the 
problem should incorporate a finite upper bound, 
say /T, for the data.^ Such a bound might be taken a 



-S<.)me may think I hat because of such issues one should bu 
considering inference about percentiles, such as the median, 
rather than the expectation. However, means are often of pnr- 
ticuliir interest and importance in real-world problems, and of 
course arc appropriale for squared error loss. If ihcrc were no 
technical difficulties at infinity with the expectation, would any- 
one argue against its use for prcdictionV 

^Instead of requiring that tlie mass be exactly beyond a cer- 
tain known bound K, one can alternatively require that the mass 
ticyond this bound be so negligible as to be of no interest. In the 
sulsjective llaycsian approach it would be remarkable for anyone 
to have a probability of 0, to infinitely many decimal points, for 
a logically possible event. However, whether or not is taken 
literally, in effect one ordinarily ignores values of the observa- 
tion larger than the bound. For the purposes of this article wc 
treat such negligible mass as though it were 0. An altcrnalive 
and neurly equivalent way to deal with the problem is to con- 
sider only amditional inference, given that the observations arc 
no larger than the bound. A general theory and mclhodoliigy for 
such conditional inference is proposed in [24]. 
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good deal larger than is ordinarily believed reason- 
able. A 10-fold increase above a previous record 
value that was based upon substantial data would 
often be too large, but is worthy of consideration. If 
such an upper bound is incorporated in the analy- 
sis, then as shown below, even if or ^ 1 there is no 
problem with infinite moments. We will typically 
assume some known finite upper bound K, perhaps 
much too large, but we will not necessarily assume 
that a>l, and will let the data speak for them- 
selves in this regard. Since the density in the tail is 
proportional to t~"~^, we see that a = corre- 
sponds (in the tail) to a uniform distribution for 
the logarithm of the observation. Such a distribu- 
tion is often used by Bayesians to represent diffuse 
a priori knowledge about a positive quantity such as 
a variance. 

Our precise model is as follows. We assume that 
there exists a known constant K such that 
O^X^K, so that K is a known upper bound for 
the data. In applications, ordinarily /C < ■», but for 
completeness we shall also discuss the case K = <x)^ 
which is sometimes appropriate and is mathemati- 
cally convenient when a>2 + e>2, in which case 
no problems arise due to infinite first or second 
moments. We do not assume in applications that 
one can necessarily determine a smallest such K, 
but merely that one can pick some bound. Wc also 
assume that there exist constants k and A with 
K>k>A >0, such that the tail is algebraic, to an 
adequate approximation, for /I ^ f €/c, with mass 
beyond K. Letj:(i)>->j:(„) be the descending order 
statistics of the past data. Ordinarily we take A to 
be the largest order statistic ol' the past data, 
A =J(i)- The quantity k is the key variable in our 
analysis. It represents the point up to which the 
algebraic assumption is assumed to be valid, k is 
not a parameter in the usual sense, but is more in 
the nature of a decision variable, since in applica- 
tions the tail will not be exactly algebraic in any 
interval, but it will nevertheless be reasonable to 
act as if it were appro.\imatcly of this form for some 
intervals. The selection of A: in part acts as a means 
to specify the portion of the distribution that we 
are particularly interested in. Even if<V^>/: we may 
not be interested in forecasting X for such extreme 
values, since the occurrence of such would force us 
to reconsider our modelling assumptions, as in 
[7,24,25]. 

We are in effect assuming a model in which the 
algebraic behavior holds, given a, to a satisfactory 
approximation for A^X^k, and that eventually 
there is {or negligible) mass beyond some known 
K>k. We assume that the same k is appropriate 



for all values of a being given positive weight. Be- 
tween k and K there must be a transition from the 
algebraic tail behavior up to k and the negligible 
mass beyond K. In this transition zone the tail of 
the distribution may not even be approximately al- 
gebraic, and if algebraic, may have a different tail 
index. The mass between k and K need not be en- 
tirely negligible, but we assume there is no data- 
based or other information concerning the form of 
the mass distribution in this interval, apart from 
the fact that the total mass in the interval is smaller 
then C xk~", as is required by the model. If k is 
large enough, then Cxfc"", although not entirely 
negligible, may be sufficiently small so that the 
mass between k and K < '^ has only a slight effect 
upon the posterior moments for A'„+i. We shall as- 
sume that this is the case, so that the tail distribu- 
tion is of algebraic form from A xo k, while beyond 
k, although not or entirely negligible, the mass is 
of no practical importance for the assessment of 
the posterior moments of A'„ + i, 

Typically, the posterior expectation of C >^x{i° 
will be of order of magnitude l/(/i + 1) based on a 
previous sample of size n. Compare the maximum- 
likelihood estimator Ci of [I, p. 1168]. This also 
corresponds to the fiducial analysis of Fisher [26, p. 
210], and to the Bayesian non-parametric proce- 
dure A„ of Hill [22,27,35]. Thus before observing 
Xi,...JC„, because of the exchangeability there is an 
unconditional probability of l/{/i + 1) that X„+t will 
be the maximum, which suggests that even condi- 
tionally this will often be of the right order of 
magnitude. As shown in [5], there is an explicit 
parametric model, called a splitting process, for 
which this evaluation holds exactly, and such an 
evaluution is coherent in the sense of de Finetti 
[28,29]. 

The constant K plays virtually no direct role in 
the following analysis, but is important because of 
the delicate issues that arise when a^2. In this 
case if there were no finite upper bound K and the 
algebraic tail were assumed valid everywhere be- 
yond^, then the posterior predictive variance of 
the next observation would be infinite; and the pre- 
dictive expectation would also be infinite unless the 
a priori distribution for a gave sufficiently small 
weight to values near 1. There is no known reason 
that a must be larger than 2, or even larger than 1, 
and the data may in fact clearly suggest that it is 
smaller than 1. But an infinite predictive expecta- 
tion would not correspond to any real world prob- 
lem that I know of concerning extreme data, and I 
doubt that one could seriously recommend such 
predictions. For example, they would lead to 
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terribly poor performance if predictions were made 
and assessed according to some proper scoring rule 
or loss function. This change in viewpoint lo reflect 
the boundedness of the data gives rise to some sur- 
prising consequences with regard to prediction. 

The key choice concerns not K but k, since even 
if there were a known finite upper bound K for the 
data, it might still not be appropriate to assume the 
algebraic form all the way up to K, but only that in 
the domain of practical importance the tail is of 
this form, say up to k, which is equal to some ap- 
propriate upper percentile of the distribution. This 
is in essence a modelling assumption, just as when 
we assume that the normal model for data is suffi- 
ciently closely satisfied to be useful in the analysis 
of that data. Modelling assumptions are nirely ex- 
actly true, but they are sometimes indispensable in 
order to proceed, and often give useful results. See 
[7,25,27]. The form of analysis that we recommend 
is a conditional analysis, given a specification oi k. 
For example, with A =Jf(i), we consider predictive 
inference about the next observation given that it 
hes between xm and some k >j:(1). If L =kfxiiy, then 
we find that it typically makes a great difference 
whether L is of order 5 or order 100, both with 
respect to the posterior predictive mean and the 
posterior predictive variance for the next observa- 
tion. Based upon the mathematical and computer 
analysis in the next sections, we recommend that 
the forecaster make a choice of L, usually with 
L^IO and sometimes even with L=2. To illus- 
trate, when L is chosen to be 3, the adecjuacy of 
our modelling assumption depends on whether it is 
or is not the case that the algebraic form holds be- 
tween X(i) and 3xjr(i), with the mass beyond 3 xx^) 
no longer even approximately of the algebraic form 
with the same a as between Aji, and 3 xx^), and also 
with the mass beyond 3 xxfi) sufficiently small so 
that for practical purposes it can be ignored. In 
principle the optimal choice of ^ is the largest value 
for which the algebraic assumption holds exactly 
(or in a suitable sense, approximately); while be- 
yond that k the tail is no longer of that same form, 
and also is of little practical importance in the eval- 
uation of the first two posterior predictive mo- 
ments. It would be difficult if not impossible in 
typical real-world problems to find such an optimal 
k, and so we recommend that several values of ^' be 
chosen, yielding different values for the posterior 
predictive moments, and then by means of judg- 
ment and data-analytic methods that a choice be 
made to yield a forecast. See for example Sec. 5 of 
fl] for a closely related type of data-analysis. Such 
analyses must be made on a computer, rather than 



purely mathematically, and can be quite demand- 
ing computationally. 

We emphasize that it docs not seem possible to 
avoid such considerations as to the choice of k, 
since in even the best of cases, where the tail of the 
distribution is known to be of the algebraic form in 
the domain of interest, the only alternative to such 
an analysis is to simply ignore the boundedness of 
the data, and take A: = « . But then our prediction 
of the next record value can become infinite, which 
is absurd in most real-world problems. Hence the 
algebraic tailed model with 1^ a^ 2 is not compat- 
ible with unbounded data unless the a priori distri- 
bution is chosen to give suitably small weight to 
values of a close to 1. There may be little or no 
evidence for choosing the a priori distribution in 
this way, and it does not seem appropriate to do so 
merely to avoid the issue, just as it does not seem 
appropriate to replace the expectation by the me- 
dian merely to avoid the issue. At any rate, this 
article shows that effective predictions can be 
made with any prior distribution for a, including 
cases where or^l, provided that one can justify 
some finite upper bound K for the observations. 

Our underlying motivation is that given the unre- 
liability of assessments of the far upper tail of a 
distribution, for predictive purposes it may be ap- 
propriate to ignore this far upper tail, i.e., the part 
beyond k, or cquivalently, to condition upon X 
falling in some finite interval, say (jr(i),L xjr(i)), for 
which the algebraic assumption is believed to be 
valid, and beyond which there is no assumption 
that is believed trustworthy. It is implicit in this 
analysis that there is little mass beyond k, and that 
in ignoring the case X^k for some appropriately 
chosen k, one loses little, while gaining the power 
of a statistical analj'sis based upon the extreme 
value model with .some a>i). In the case of a 
known finite upper bound K, in effect we perform 
conditional inference, given that the observation is 
not too large, and then examine sensitivity to the 
choice of k. The same is true if the random vari- 
able is unbounded and K = co, since again beyond 
a certain percentile one would have no empirical 
basis for any assumption in the far upper tail. 
Whatever extreme value theory exists for tails of 
distributions could not be expected to hold literally 
in the far upper tail of the distribution, where no 
data has been observed. Nevertheless, one may 
have lo make some forecasts, and it would appear 
reasonable to assume that the algebraic assumption 
holds for at least some distance beyond jC(i). ff this, 
or some other assumed model does not hold 
beyond j:(i) then plainly no serious theory-based 
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forecasting is possible. But if through data analysis, 
as in [1,26], it has been discovered that the alge- 
braic assumption is acceptable for say the upper 
r + 1 order statistics of the past data, then it would 
be reasonable to anticipate that this will also be 
true for some distance beyond X(i). A Baycsian the- 
ory of data analysis is put forth in [25] which indi- 
cates how the classical Bayesian approach must be 
modified to deal with issues that arise from such 
data analysis. 

Finally, real world data sets of interest in regard 
to the forecasting of extreme values are not neces- 
sarily of the long-tailed algebraic form that we have 
discussed. In this case we recommend that a trans- 
formation be first applied to the data in order to 
make the upper tail of the long-tailed form. For 
example, if the tail is of Weibull form, then the 
transformation to exp X' yields an algebraic tail, as 
discussed in [1,10]. When the form of the tail is 
unknown, data-analytic methods can be used to de- 
termine an appropriate transformation. In Ihisway, 
having learned how to forecast extreme tails for the 
long-tailed distributions as a type of standard case, 
we can also apply our methods to distributions not 
of this form in the upper tail, and then take the 
inverse transformation to forecast the extreme val- 
ues in the original units in which the data were 
measured. Such methods are quite common in 
statistics, for example in transforming data in order 
to obtain approximate normality, using normal 
methods for analysis of the data, and then trans- 
forming back to the original units. In the Baycsian 
scenario it is even possible to provide a strong justi- 
fication for these methods, since conditional upon 
the data, one can quite freely transform the 
parameters, and obtain the posterior distribution 
for the new parameters by the usual calculus of 
transformations. 



2. Predictive Moments for Known a 

Our object is to evaluate, as meaningfully and 
robustly as possible, the posterior moments 

for specified A and k, and / = 1,2. 1'he primary ap- 
plication will be in the case where there has been a 
previous sample, Xt,...^n- Let D denote the data 
Xt=xi,...^„—x„. Given this data, we wish lo fore- 
cast the next observation X„+[. It is notationally 
convenient to refer toA",,*! as -V from now on. Since 
A will usually be held fixed, we suppress it in the 



notation. To evaluate the posterior predictive ex- 
pectation of A' we first condition on a, to obtain 

f{k,a)=E{X\A^X^k,a), 

and then we take the expectation of this quantity 
with respect to the posterior distribution of a to 
obtain the predictive expectation of primary inter- 
est. 

Based upon our assumption that the tail is alge- 
braic between A and k, we obtain 

f(ka)= -P^""^" 



For L=^, this yields: 



Kk,a) = 



Ax- 



i-i-i- 



if a^OA 



Ax\n{L)XT~ ifa = l 
^^i^ ifa = 0. 



HI.) 



For a 5^0, 1, we can also write: 



f(k,a)=A X— ^xLx 



L"-\ 



(1) 



(2) 



A similar equation is available for /'■*(A:,a)s 
E(X"\A rS^X^k.a). We obtain: 



/^^'(Jt,a) = J2x^2xln(L)x[^] ifa = 2 



(3) 



The posterior predictive variance for a future 
record value X, given a, is therefore 



V{k,a)=r'\k,a)-ifik,<^)]-- 



(4) 



It follows from (2) that for a > 1, as L— »oo we have 



fik,a)~A X- 



1 



(5) 



When a>\, the right-hand side of (5) decreases 
from 00 for « = 1 to the value 2x/l when « = 2, 
with the value 2xA when a = 1.5. Provided that a 
is bounded away from 1 this expectation remains 
bounded. 

For a^2, the posterior predictive variance goes 
to <» as L-^a:. If we define e=2-a>0 then for 
large L 



P'\k,a}=-A^^ax 



L'-l 



(6) 



S7fi 
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For each L>1, and for e>0, the function 
^(e)=^=^ is monotonically increasing in c. For 
0<c^2 it has a maximum value of ^^^f^ when 
e = 2, and an infimum of \n(L) as e-*0. For large 
L, as e-*0 we see from Eqs. (3), (5), and (6), that 



V(k,a)='A'xia\niL)-[-^n 



(7) 



From Eq. (3), it follows that for o >2 the poste- 
rior predictive variance remains bounded, and as 
L-*oo tends to the limiting value 



A^x 



(«-2)(a-l)' 



(8) 



Now consider the forecasting of the maximum of 
N future observations, Define 

and let TT*{a,C) be the posterior distribution for 
a.C, based upon the data Z). The likelihood func- 
tion Ltia,^) of [1], when converted from lower tail 
to upper tail inference, can be used to obtain this 
posterior, distribution. For t >A , we have 

P{M>t\D)=\ f [I-(l-Cx/-yj 



T7*(«,C)dadC. 



(9) 



When N = \ this gives the posterior predictive 
distribution for a single new observation consid- 
ered earlier, except that here we have not yet con- 
ditioned upon X^A. Just as before, one can 
consider the posterior moments of M, given that 
M >/4 . When N is not small it is very probable that 
M &af(i)> so that a new record will be set. Thus for 
large N the predictive distribution of M will be ap- 
proximately the same as the predictive distribution 
of M, given A/^x,)). 

In Table 1 we present for several values of a ihe 
predictive moments as obtained by numerical inte- 
gration. The predictive mean is denoted by E'(X) 
and the predictive standard deviation by SD*{X). 
The column labelled DIST gives the posterior pre- 
dictive probability that X is larger than 2, 3, and 5 
times^ . Values of a go from .10 to 1.90, and values 
for L go from 1.25 to 10*, It can be checked that 
the above asymptotic formulas hold quite closely 
for Tixed a. 

We see from Table 1 that the posterior expecta- 
tion of A", given that X>A , is only a few multiples 



of >!, even when a is as small as .10, provided that 
L * 10. !n an important class of application A is 
taken to be Xdj, so that the real action takes place 
with regard to a few multiples of the largest obser- 
vation yet observed. When L ^ 2 we see that the 
value of a between .10 and 1.90 has very little ef- 
fect on the posterior predictive first and second 
moments. On the other hand, when L is very large 
the value of a has a huge effect. For example, the 
posterior expectation drops from 37,297 x /I when 
L = 10* to 2.11 x/1 , as a changes from .10 to 1.90. 
The choice of L can make a huge difference when 
a^S 1. However, in many applications of extreme 
value theory, it could safely be assumed that 
L^IO, in which case L has only a minor effect 
even when a^ 1. The choice of L has a greater 
effect with regard to the predictive variance, but 
again if L ^ 10 there is substantial robustness.* 
Thus the first conclusion that we draw is that in a 
real-world problem, where there has been substan- 
tial data, such as with regard to wind velocities, 
temperatures, etc., and where one does not take 
seriously the possibility of the next record value be- 
ing an enormous multiple of the current maximum, 
the precise choice of or and L has a limited effect 
upon the forecast. This is precisely what we are 
aiming for. namely an approach in which one can 
seriously input a priori knowledge regarding a and 
L in such a way as to sec clearly the real but limited 
effect of such choices. 

Table 1 refers to the case of known a. In prac- 
tice a will ordinarily be unknown. The Bayesian ap- 
proach is to employ some a priori distribution it for 
a, obtain the posterior distribution for q given D, 
and then obtain the posterior expectation of X, 
given that A 'S.X^k. For a specified k, this poste- 
rior expectation can be written as 

fik)=E[E{X\D^ ^X^k,a)]^E\f{k,a)l (10) 

where the last expectation is taken with respect lo 
the posterior distribution of a. Similarly, the poste- 
rior second moment for X is obtained by evaluating 

f'>ik) = E[E(X'\DA^X^k,a)]^E[f'\k,a)UU) 

We employ the theory of [I] to obtain a likeli- 
hood function for the parameter a based upon the 
upper order statistics of the past data. We first 
condition upon the upper r + 1 order statistics of 
the data lying in the region where the tail is of 



* Sec 130) for a general formuliitiun of (he robustness problem 
in Bayesiun statistics. 
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Table 1. Fixed ALPHA 



ALPHA 


PRED 




DIST 




BOUND 


a 


E-(X) 


SD'(X) 


2 


3 


S 


L 


.10 


1.12 


.07 


.93 


.85 


.79 


1.25 


.10 


1.23 


.14 


.93 


.85 


.79 


IJO 


.11 


I.*t 


.29 


.93 


.85 


.79 


2 


.10 


1.80 


.57 


.93 


.85 


.79 


3 


.10 


2.43 


1.12 


.93 


.85 


.79 


5 


.10 


3.75 


2.45 


.93 


.85 


.79 


10 


.10 


18.70 


23.46 


,93 


.85 


.79 


100 


.16 


734.88 


1715.25 


.93 


.85 


.79 


10* 


.10 


37297.27 


1.28x10' 


,93 


,85 


.79 


10^ 


.50 


1.12 


.07 


.71 


.45 


32 


1.25 


..W 


1.22 


.14 


,71 


.45 


32 


1.50 


.SO 


1.41 


.28 


.71 


.45 


.32 


2 


.50 


1.73 


.56 


,71 


.45 


.32 


3 


.SO 


2.24 


1.07 


.71 


.45 


.32 


5 


,50 


3.ie 


2.22 


.71 


.45 


.32 


10 


,50 


10.00 


16.43 


.71 


.45 


.32 


100 


J« 


100.05 


57L77 


.71 


,45 


.32 


10* 


JO 


1001.62 


1S257J6 


.71 


.45 


.32 


10^ 


.98 


1.12 


.07 


.54 


.23 


.13 


1.25 


.90 


1.22 


.14 


.54 


.23 


.13 


1.50 


.90 


1.39 


.28 


.54 


.23 


.13 


2 


.90 


1.66 


J4 


.54 


.23 


.13 


3 


.90 


2.05 


1.00 


J4 


.23 


.13 


5 


M 


2.67 


1.93 


J4 


.23 


.13 


10 


.90 


■S.35 


10.12 


J4 


.23 


.13 


100 


.90 


13.62 


142.79 


J4 


.23 


.13 


10* 


.90 


26.86 


1806.66 


J4 


.23 


.13 


10* 


LIO 


1,12 


.07 


.47 


.17 


.08 


1.25 


1.10 


1.21 


.14 


.47 


.17 


.08 


1.50 


MO 


1.38 


.28 


.47 


.17 


.08 


2 


LIO 


1.63 


.53 


.47 


.17 


.08 


3 


1.10 


1.97 


.96 


.47 


.17 


.08 


S 


1.10 


2.46 


1.78 


.47 


.17 


.08 


10 


1.10 


4.09 


7.73 


.47 


.17 


.08 


100 


1.10 


6.62 


69.46 


.47 


.17 


.08 


10* 


1.10 


8.24 


554.67 


.47 


.17 


.08 


10* 


1.50 


1.11 


.07 


.35 


.09 


.03 


1.25 


1.50 


1.21 


.14 


.35 


.09 


.03 


1..50 


I JO 


1.36 


.27 


.35 


.09 


.03 


2 


1.50 


1.57 


.50 


.35 


.09 


.03 


3 


1.50 


1.82 


.87 


J5 


.09 


.03 


5 


1.50 


2.12 


1.49 


.35 


.09 


.03 


10 


1.50 


2.70 


4.44 


35 


.09 


.03 


100 


1.50 


2.97 


16.98 


35 


.09 


.03 


10^ 


I JO 


3.00 


54.72 


.35 


.09 


.03 


10* 


1.90 


1.11 


.07 


.27 


.05 


.01 


1.25 


1.90 


1.20 


.14 


.27 


.05 


.01 


1.50 


1.90 


U4 


.27 


J27 


.05 


.01 


2 


1.90 


1.51 


.48 


.27 


.05 


.01 


3 


1.90 


1.69 


.78 


.27 


.05 


.01 


5 


1.90 


1.87 


1.22 


.27 


.05 


.01 


10 


1.90 


2.«i 


2.61 


.27 


.05 


.01 


100 


1.90 


2.11 


4.93 


.27 


.05 


.01 


10* 


1.90 


2.11 


7.23 


21 


.05 


.01 


10" 
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algebraic form, i.e., larger than D of [1], and then 
condition upon the values of the ratios of upper 
order statistics v, =x''Vi''*'*, for i = \,.../. As siliown 
in [1], if we are indeed in the upper tail of the 
distribution where the algebraic forn> holds, Ihcn 
conditional upon a, the quantities c, =i xjn i', are 
independent with a common exponential distribu- 
tion having parameter a. A sufficieni statistic for 
a, conditional upon the Vj and r, is ihen 



I-f(r) = 






(12) 



The (conditional) likelihood function based upon r 
and / is then 



L (or) (X ff' X exp[ — a-t ], 



(13) 



for a >0. In conjunction with some a prion ilisiri- 
bution for a this likelihood function c;in be used to 
obtain the posterior distribution for «. If ^ is large 
and or > 1, we see from (5) that 



E{X\D/l^X^k,a)'^A x-~-. 



a-l 



(14) 



In general, the predictive moments of A' can only 
be obtained by numerical intcgralion. In Sec. 4 we 
examine the sensitivity of such quantities to the 
data, choice of L, and choice of a priori distribution 
for a. The case A = oo, however, has a closed form 
analytic solution for a Gamma u piion dislribution 
of a, and this contributes some insight into the lie- 
havior or the jxisierior moments of X. 



3. Jt=:oo 

In this section wc e.\amino the special case in 
which the distribution is known to he algebraic ev- 
erywhere beyond j4 . In this case, in order for poste- 
rior moments to be finite, we will have (o ubsurne 
that a is sufficiently large. It follows from Eq. (1) 
that the posterior expectation of A*,,, i, given that ii 
is in the upper tail and a, is finite if and only if 
a > 1. In the Bayesian analysis, with an a piioii dis- 
tribution for a, the unconditional posiciior expec- 
tation of X is finite if and cmly if the a priori 
distribution sufficiently downweights values of u 
near 1. 

We can gain some insight by supposing that a > 1 
has the prior distribution 

'n-(tt) = c X (a - l)*-'expf - j3(w - I jj, 



fur 6,^>0, where c = r(5)//3* is a proportionality 
constant. In other words, we give a— 1>0 a 
Gamma a priori distribution. If 5 > 1 we obtain 
from Eq. (I) that the posterior e.xpeciation of X/A, 
given X \^A , is 



Ei^,\D) 



J,'(l-fjf)'*'X5'"-Xexpr-(f-t-/3>yTdi 
" r(l+j)'xj*-'xexp[-(r-(-/3)j]d5 ' 



(15) 



This expectation is finite provided that fi > 1. 

For positive integral values of r we can expand 
the powers of 1 +s tising the binomial theorem, and 
this allows us to make explicit evaluations. To illus- 
trate, a r — ] as in the forecasting of city sizes in 
Tables 6 and 7, we have 



/r( 



a-l 



py 



]+2(d-mt-¥P) + 3(!i-l)!(t+eY 



1 + S/{i + ^) 



5-1 



(16) 



This reveals the manner in which the expectation 
blows up as S >1. When 5 = 2, the right-hand side 
can be written as 

i+p + 2 

For t + (i = \,\\c obtain the value 1.67. This is com- 
parable with the values in Tables 2, 3, and 4, when 
r=r = !. and L^5. For ;■ = I and 8=2,f(k) is ap- 
proximately (] +t +fi}xA, provided that t+p h 
sufficiently large. Similarly, other integral values of 
r yield closed form expressions, which provide some 
insight as to the behavior or the posterior expecta- 
tion of A'. 

From Eqs. (3) and (11), the posterior predictive 
second moment for X, given that X ^A , is 



r'(k)=A'xE[^\DJC^A]. (17) 



If a > 2 and the a priori distribution for a - 2 is of 
the Ganmia form, with parameters 8,13, the poste- 
rior predictive variance for X will be finite, pro- 
vided that 5>1. Closed form expressions can be 
obtained when /■ is a positive integer, just as with- 
the corresponding predictive first moment. 
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Table 2. 


Uniform prior, LB 


= 1.001, UB = 


= 1.999, prior 


mean =1 JO, 


SD = .29 






DATA 


POST 


PRED 




DIST 




BOUND 


r 


( 


E'{a) 


SD*(a) 


E'{X) 


SD'{X) 


2 


5 


10 


L 






1.47 


.29 


1.11 


.07 


.37 


.10 


.04 


1.25 






1.47 


39 


1.21 


.14 


.37 


.10 


.04 


1,.50 






1.47 


.29 


1.36 


.37 


.27 


.10 


.04 


2 






1.47 


J9 


1.58 


.31 


.27 


.10 


M 


3 






1.47 


.29 


1.84 


.88 


.37 


.10 


.04 


5 






1.47 


.29 


2.16 


1-54 


S) 


.10 


.04 


10 






1.47 


.29 


2.96 


5.37 


31 


,10 


.04 


10^ 






1.47 


.29 


3.82 


.18,95 


J7 


.10 


.04 


10* 






1.47 


.29 


430 


305.19 


37 


.10 


.04 


10^ 




2 


IJO 


.28 


1.11 


.07 


.36 


.10 


.04 


1.25 




2 


UO 


.28 


1,21 


.14 


.35 


.10 


.04 


1.50 




2 


1.50 


.28 


1.36 


.27 


.36 


.10 


.04 


2 




2 


1-50 


.28 


1^7 


.51 


.36 


.10 


.04 


3 


3 


2 


1.50 


.28 


1,S3 


,88 


36 


.10 


.(M 


5 


3 


2 


1.50 


.28 


2.14 


1.52 


.36 


.10 


.04 


10 


3 


2 


1,50 


.28 


2.88 


5.16 


36 


.10 


.04 


10^ 


3 


2 


1.50 


.28 


3.63 


35.99 


36 


.10 


.04 


10^ 


3 


2 


1.50 


.28 


4.03 


276-83 


36 


.10 


.04 


10* 


2 


3 


1.37 


.27 


1.11 


.07 


39 


.12 


,05 


1.25 


2 


3 


1.37 


.27 


1.21 


,14 


39 


.12 


.05 


1.50 


2 


3 


1,37 


.27 


1.37 


.28 


39 


.12 


.05 


2 


2 


3 


1.37 


.27 


1.59 


.51 


.39 


.12 


.05 


3 


2 


3 


1.37 


.27 


1.87 


.90 


39 


.12 


,05 


5 


2 


3 


1.37 


.27 


2.24 


1.61 


.39 


36 


.36 


10 


2 


2 


1.37 


.27 


3.22 


5.99 


39 


.36 


.36 


10= 


2 


3 


1J7 


.27 


4.43 


46.85 


.39 


.36 


36 


10' 


2 


3 


1.37 


.27 


5.16 


377,37 


39 


.36 


.36 


10" 




1 


1.67 


.25 


1.11 


.«7 


32 


.U7 


.03 


1.25 




1 


1.67 


.25 


1.21 


.14 


.32 


.07 


.03 


1.50 




I 


1.67 


.25 


1.35 


.27 


32 


.07 


.03 


2 




1 


1.67 


.25 


1J5 


.49 


32 


.07 


.03 


3 




1 


1.67 


.25 


1.77 


.84 


32 


.07 


.03 


5 




1 


1.67 


.25 


2.02 


1.40 


32 


.07 


.03 


10 




1 


1.67 


.25 


2.49 


4.05 


.32 


.07 


.03 


10-' 




1 


1.67 


.25 


2'.80 


21.99 


32 


.07 


.03 


10* 




I 


1.67 


,25 


2.93 


152.19 


.32 


.07 


.03 


10* 




5 


1.22 


.20 


1.11 


.07 


.43 


.15 


.07 


1.25 




5 


1.22 


,20 


1.21 


.14 


.43 


.15 


.07 


1,50 




5 


1.22 


.20 


1J7 


.28 


.43 


.15 


.07 


2 




5 


1.22 


.20 


1.61 


.52 


.43 


.15 


.07 


3 




5 


1.22 


.20 


1.93 


.93 


.43 


.15 


.07 


5 




5 


1.22 


,20 


'tM 


1.71 


.43 


.15 


.07 


10 




5 


1.22 


.20 


3,6» 


%S1 


,43 


.15 


.07 


10= 




5 


1.22 


,20 


5.61 


60.75 


.43 


.15 


.07 


10* 




5 


1.22 


.20 


&M 


512.81 


.43 


.15 


.07 


10^ 


30 


20 


1.52 


.23 


1.11 


.07 


35 


.09 


.03 


1.25 


3U 


20 


1J2 


.23 


1.21 


.14 


35 


.09 


.03 


1.50 


30 


20 


1.52 


.23 


\M 


.27 


35 


.09 


.03 


2 


30 


20 


1.52 


.23 


1.57 


.50 


.35 


.09 


.03 


3 


30 


20 


1.52 


.23 


1.82 


.87 


.35 


.09 


.03 


5 


30 


20 


1.52 


,23 


2.12 


1.50 


.35 


.09 


.03 


10 


30 


20 


1.52 


,23 


2.7« 


4.81 


.35 


.09 


.03 


100 


30 


20 


1.52 


.23 


3.29 


28.02 


..15 


.09 


.03 


ur 


30 


20 


1.52 


,23 


3.48 


185.91 


,35 


.09 


.03 


10* 
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Table 2. Uniform prior, 


LB= 1,001. UB. 


.1.999, pri(i 


>r mean= 1.50, 


SD = .; 


J9-Coni 


[inuetl 




DATA 




POST 


PRED 




DIST 




BOUND 


r 


t 


E-(a 


) SD-(a) 


E'(X) 


SD'(X) 


2 


5 


10 


L 


20 


30 


1.08 


.07 


1.12 


.07 


.47 


.18 


.«« 


1.25 


20 


30 


1.08 


.07 


1.22 


.14 


.47 


.18 


.08 


1.50 


20 


30 


1.08 


.07 


1.38 


.28 


.47 


.18 


.08 


2 


20 


30 


1.08 


.07 


1,63 


.53 


.47 


.18 


.08 


3 


20 


30 


1.08 


.07 


1.9a 


.96 


.47 


.18 


.08 


5 


20 


30 


1.08 


.07 


2,48 


1.80 


.47 


.18 


.08 


10 


20 


30 


1.08 


.07 


4.21 


8.01 


.47 


.18 


.08 


100 


20 


30 


1.08 


.07 


7.27 


78.63 


.47 


.18 


.08 


10^ 


20 


30 


1.08 


.07 


9.7 1 


706.23 


.47 


.18 


.08 


10^ 


300 


200 


IJO 


.09 


Ml 


.07 


.35 


.09 


.03 


1.25 


300 


200 


1.50 


.09 


1.21 


.14 


.35 


.09 


.03 


1.50 


300 


200 


1.50 


.09 


1.36 


.27 


35 


.09 


.03 


2 


300 


200 


1.50 


,09 


1.57 


.50 


.35 


.09 


.03 


3 


300 


200 


1.50 


.09 


1.82 


.87 


35 


.09 


.03 


5 


300 


200 


1.50 


.09 


2.12 


1.49 


J5 


.09 


.03 


10 


300 


200 


1.50 


,09 


i71 


4.48 


.35 


.09 


.03 


10= 


300 


200 


IJO 


.09 


3.00 


18.24 


.35 


.09 


.03 


10* 


300 


200 


^0 


.09 


3.04 


67.25 


J5 


.09 


.03 


10* 


200 


300 


1.01 


.01 


1.12 


.07 


.50 


.20 


.10 


1.25 


200 


300 


1.01 


.01 


1.22 


.14 


jfl 


.20 


.10 


1.50 


200 


300 


1.01 


,01 


1..39 


.28 


.50 


.20 


.10 


3 


200 


300 


1.01 


.01 


1.65 


.53 


.50 


,20 


.10 


3 


200 


300 


1.01 


.01 


2.01 


.97 


.50 


.20 


.10 


5 


200 


300 


1.01 


.01 


L-iS 


1.85 


JO 


.20 


.10 


10 


200 


300 


1.01 


.01 


4.59 


8.73 


.50 


.20 


.10 


10^ 


200 


300 


1.01 


.01 


8.88 


95.94 


.50 


.20 


.10 


10^ 


200 


300 


1.01 


.01 


13.01 


941.80 


JO 


.20 


.10 


10* 



4. Jt <« 

One of our purposes in this article is to show 
that prediction can be very sensitive to the a priori 
information introduced regarding L, and that it is 
essential to incorporate strong a priori information 
as to the magnitude of this quantity in order to 
obtain realistic forecasts. No closed form results 
are available apart from those of the last section. 
We consider now various a priori distributions for 
or. In the previous analysis it was not possible to 
give a a uniform distribution, since this would re- 
quire = and 5 = I, in which case with infinite k 
the expectation is infinite. However, with a finite 
upper bound for A", we obtain a finite expectation 
for any ot^O, and in fact even for negative a, al- 
though this case is of little interest. 

Table 2 displays results for the case of a uniform 
a priori distribution for a, using a finite grid of pos- 
sible values for a between LB = I.OOI and 
UB = 1.999, several values of r and t, and several 
choices of L. The prior expectation and standard 



deviation for a and 1.50 and .29, respectively. Table 
3 gives such results for a uniform a priori distribu- 
tion, using a finite grid of values between LB = .001 
and UB = 1 .999, in which case the prior expectation 
and standard deviation for a are 1.00 and .58, re- 
spectively, in these tables the column labelled 
"POST" gives the posterior expectation and stan- 
dard deviation for a, the column labelled "PRED" 
gives the posterior predictive expectation and stan- 
dard deviation for the next observation X, and the 
column labelled "DIST" gives the posterior proba- 
bility that X is larger than 2,' 5, and 10 times A, 
respectively. 

So far we have only considered very strong a 
priori knowledge, such as in Table 1 where a is 
known, and very weak a priori knowledge, such as 
the uniform distributions of Tables 2 and 3. In ap- 
plications it is important also to be able to input an 
a priori distribution for a in which some values are 
singled out as being given substantially more 
weight than others. A useful family of a priori dis- 
tributions for rt for this purpose is the three- 
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Table 3 


Uniform prior, LB 


=0.001. UB- 


. 1.999, prioi 


■mean = 1.00, 


SD = .53 






DATA 


POST 


PRED 




DIST 




BOUND 


r 


t 


£'(a) 


SD'(tr) 


E'(X) 


SD'(X) 


2 


5 


10 


L 


1 




1.09 


.51 


1.12 


.07 


.50 


.24 


.15 


1.25 


1 




1.09 


Jl 


1.21 


.14 


.50 


.24 


.15 


1.50 


1 




1.09 


SI 


1.38 


.28 


.50 


.24 


■ 15 


2 


1 




1.09 


.51 


1.64 


33 


.50 


.24 


.15 


3 


1 




1.09 


J1 


1.99 


.98 


JO 


.24 


.15 


5 


1 




1.09 


Jl 


2.55 


1.90 


.50 


.24 


.15 


10 


1 




1.09 


Jl 


5.71 


11.70 


.50 


.24 


.15 


100 


1 




1.09 


.51 


.59.13 


474.46 


.50 


.24 


.15 


10* 


1 




1,00 


.51 


1599.34 


26459.07 


.50 


.24 


.15 


10^ 


3 


2 


].31 


.42 


1.11 


.07 


.42 


.15 


.08 


1.25 


3 


2 


1.31 


.42 


1.21 


.14 


.42 


.15 


.03 


1.50 


3 


2 


1.31 


.42 


1.37 


.28 


.42 


.15 


.08 


2 


3 


2 


1.31 


.42 


1.60 


.52 


.42 


.15 


.08 


3 


3 


2 


131 


.42 


1.90 


.93 


.42 


.15 


.08 


5 


3 


2 


1.31 


.42 


2J2 


1.70 


.42 


.15 


.08 


10 


3 


2 


131 


.42 


3.98 


8.23 


.42 


.15 


.08 


100 


3 


2 


1.31 


.42 


15.97 


209.46 


.42 


.15 


.08 


10* 


3 


2 


OI 


.42 


187.75 


8561.51 


.42 


.15 


.08 


lO' 


2 




.90 


.44 


1.12 


.07 


.56 


.29 


.19 


1.25 


2 




.90 


.44 


1.22 


.14 


.56 


.29 


.19 


1.50 


2 




.90 


.44 


1.39 


.28 


.56 


.29 


.19 


2 


2 




.90 


-44 


1.67 


.54 


.56 


.29 


.19 


3 


2 




.90 


.44 


2.07 


1.01 


.56 


.29 


.19 


5 


2 




.90 


.44 


2.72 


2.01 


.56 


.29 


.19 


10 


2 




.90 


.44 


6.76 


13.07 


.56 


.29 


.19 


100 


2 




.90 


.44 


70.77 


511.32 


.56 


.29 


.19 


lO' 


2 




.90 


.44 


1619.79 


26176.42 


.56 


.29 


.19 


10« 


5 




1.64 


.29 


1.II 


.07 


.33 


.08 


.03 


1.25 


5 




1.64 


.29 


1.21 


.14 


J3 


.08 


.03 


IJO 


5 




1.64 


.29 


1.35 


.27 


.33 


.08 


.03 


2 


5 




1.64 


.29 


1J5 


.50 


.33 


.08 


.03 


3 


5 




1,64 


.29 


1.78 


.85 


.33 


.08 


.03 


5 


5 




1,64 


.29 


2.04 


1.43 


.33 


.08 


.03 


10 


5 




1.64 


.29 


2.62 


4.57 


.33 


.08 


.03 


100 


5 




1.64 


.29 


3.57 


50„32 


J33 


.08 


.03 


Iff' 


5 




1.64 


.29 


7.89 


1247.04 


.33 


.08 


.03 


10^ 


1 




.40 


.28 


1.12 


.07 


.77 


.57 


.47 


1.25 


1 




.40 


,28 


1.23 


.14 


.77 


.57 


.47 


1.50 


1 




.40 


.28 


1.42 


.29 


.77 


J7 


.47 


2 


1 




.40 


.28 


1.75 


.56 


.77 


J7 


.47 


3 


1 




.40 


.28 


2.29 


1.09 


.77 


.57 


.47 


5 


1 




.40 


.28 


3.33 


2.31 


.77 


.57 


.47 


10 


1 




.40 


.28 


12.74 


19.46 


.77 


S7 


.47 


100 


1 




.40 


.28 


309.24 


1125.56 


.77 


.57 


.47 


10^ 


1 




.40 


.28 


11841. .58 


72682.37 


.77 


.57 


.47 


ity 


30 


20 


1j51 


.24 


1.11 


.07 


.36 


.09 


.04 


1.25 


30 


20 


1.51 


.24 


1.21 


.14 


.36 


.09 


.04 


1.50 


30 


20 


1.51 


.24 


1.36 


.27 


.36 


.09 


.04 


2 


30 


20 


Ul 


.24 


1.57 


.50 


.36 


.09 


.04 


3 


30 


20 


IJI 


.24 


1.82 


.87 


J6 


.09 


.04 


5 


30 


20 


1.51 


.24 


2.13 


1.50 


.36 


.09 


.04 


10 


30 


20 


1.51 


.24 


2.81 


4.92 


.36 


.09 


.04 


100 


30 


20 


1.51 


.24 


3.42 


32.25 


.36 


.09 


.04 


10" 


30 


20 


1.5> 


.24 


3.80 


284.87 


.36 


.09 


.04 


10^ 
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Tabk J. Uniform prior, 


LB = 0.001. UB = 


= 1.999, prioi 


•nie;m=l.0O, 


SD = .58- Continued 




DATA 




POST 


PRED 




DIST 




BOUND 


r 


/ 


£•(« 


) SD*(«) 


E'{X) 


SD*(X) 


2 


5 


10 


L 


20 


30 


.70 


.15 


U2 


.07 


.62 


.33 


.21 


1.25 


^ 


30 


.70 


.15 


1.22 


.14 


.62 


.33 


.21 


1.50 


29 


30 


.70 


.15 


1.40 


.28 


.62 


.33 


.21 


2 


■m 


30 


.70 


.15 


1.70 


.55 


.62 


.33 


.21 


3 


2® 


30 


.70 


.15 


2.14 


1.03 


.62 


.33 


.21 


S 


30 


30 


.70 


.15 


2.91 


2.09 


.62 


.33 


.21 


10 


20 


30 


.70 


.15 


7.49 


13.50 


.62 


.33 


.21 


100 


20 


30 


.70 


.15 


46.87 


366.80 


.62 


.33 


.21 


10* 


20 


30 


.70 


.IS 


357.77 


10632.53 


.62 


.33 


.21 


10* 


300 


200 


1.50 


.09 


1.11 


.07 


.35 


.09 


.03 


1.25 


300 


200 


1.50 


.09 


1.21 


.14 


.35 


.09 


.03 


1.50 


300 


200 


1.50 


.09 


1.36 


.27 


.35 


.09 


.03 


2 


300 


200 


\5Q 


.09 


1.57 


JO 


35 


.09 


.03 


3 


300 


200 


1.50 


.09 


1.82 


.87 


.35 


.09 


.03 


5 


300 


200 


1.50 


.09 


2.12 


1.49 


JS 


.09 


.03 


10 


300 


200 


1.50 


.09 


2.71 


4.4S 


J5 


.09 


.03 


100 


300 


200 


1.50 


.09 


3.00 


18.24 


J5 


.09 


.03 


Iff- 


300 


200 


1.50 


.09 


3.04 


67.25 


^s 


.09 


.03 


10* 


200 


300 


.67 


.05 


1.12 


.07 


.63 


.34 


.22 


1.2.S 


200 


300 


.67 


.OS 


1.22 


.14 


.63 


.34 


.22 


1.50 


200 


300 


.67 


.05 


1.40 


.28 


.63 


.34 


.22 


2 


200 


300 


.67 


.05 


1.70 


.55 


.63 


.34 


.22 


3 


200 


300 


.67 


.05 


2.16 


1.04 


.63 


.34 


.22 


S 


200 


300 


.67 


.05 


2.94 


2.10 


.63 


.34 


.22 


10 


200 


300 


.67 


.05 


7.62 


13.58 


.63 


.34 


.22 


100 


200 


300 


.67 


.05 


4I.SI 


330.65 


.63 


.34 


.22 


10* 


200 


300 


.67 


.05 


212.71 


7448.90 


.63 


.34 


.22 


10* 



parameter log-normal fainily. Suppose that 
\n{a — y)~N(fj.,(r'). This is the three-parameter 
log-normal distribution with threshold parameter 
y, and is a very convenient and interesting family 
with which to make inference about a. See 
Aitchison and Brown [31], and Hill [32] for some 
properties of this distribution. The integrations in 
this case again have to be done by numerical analy- 
sis. In Table 4 we present results for the case y = 1, 
with a taking values between LB = 1.001 and 
UB = 10. The prior mean and standard deviation 
for a are 1.50 and ,61, respectively. 



5. Discussion of Tables 

If a>2 then for fixed known a there is no prob- 
lem with infinite first and second moments. This is 
also the case when a is unknown, except that the a 
priori distribution for a must give sufficiently small 
weight to values near 2 in order that the second 
moment be finite. However, the case a>2, al- 
though of some interest, does not deal with truly 



long-tailed distributions. For a>l, and using a 
Gamma prior distribution for a — I with 5>1, as 
k-*oo the posterior moments of A' converge to the 
limiting results discussed in Sec. 3, such as in Eq. 
(16). We observe, however, that the convergence is 
quite slow. For values of k in the practical range, 
say L 5 10, the results are not very sensitive to the 
precise value of L, but are quite different from the 
limiting re.sults, because the convergence is so slow. 
For example, the theoretical value for the multi- 
plier of v4 when r = 0,f = l,5 = 2,j3 = l, is 3. Using 
UB = 10, when L = 10" the calculated value for this 
multiplier is 2.86, and it is still only 2.98 when 
L = \(f\ For L ^ 10^ however, the multiplier is less 
than 2.16, and for values L^IO, it is at most 2. 
Thus even in this case, where the posterior expec- 
tation exists for A: = 00, it can still be important to 
use a realistic value for L. Although this case can 
be described as a genuine long-tailed distribution, 
in order for the posterior expectation of X to be 
finite when Jt = oo, it is necessary to take S > 1, and 
so {he a priori expectation for or must be larger than 
1 + 1/)3. 
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Table 4. Log-normal prior, LB= 1.001, UB=10,y=l, ii = -1.19, tr=l, prior mean = 1.50, SD = .61 



DATA 


POST 


PRED 




DIST 




BOUND 


r 


t 


£•(«) 


SD*(a) 


E%X) 


sr)\x) 


2 


5 


10 


L 






1.39 


.38 


1.11 


.07 


.39 


.12 


.05 


1.25 






1.39 


,38 


1.21 


.14 


.39 


.12 


.05 


1.50 






1.39 


.38 


1.37 


.28 


.39 


.12 


.05 


2 






1.39 


.38 


1.59 


.51 


.39 


.12 


.05 


3 






1.39 


.38 


1.87 


.90 


.39 


,12 


.05 


5 






1.39 


.38 


2.24 


1.62 


.39 


.12 


.05 


10 






1.39 


.38 


3.2fi 


6.09 


.39 


.12 


.05 


100 






1.39 


.38 


4.48 


46.6! 


.39 


,12 


,05 


10' 






1.39 


.38 


5,15 


353.95 


J9 


.12 


.05 


10^ 




2 


1.41 


.38 


l.ll 


.07 


.39 


.12 


.05 


1.25 




2 


1.41 


.38 


1.21 


.14 


.39 


.12 


.05 


1.50 




2 


1.41 


.38 


1.36 


.27 


.39 


.12 


.05 


2 




2 


1.41 


.38 


1.59 


.51 


.39 


.12 


.05 


3 




2 


1.41 


.38 


1.86 


.90 


.39 


.12 


.05 


5 




2 


1.41 


.38 


2.22 


1.60 


.39 


.12 


.05 


10 




2 


1.4! 


.38 


3.18 


5.92 


..39 


.12 


.05 


1(X) 




2 


1.41 


38 


4.30 


44.35 


.39 


.12 


.05 


10^ 




2 


1.41 


.38 


4.90 


332.81 


.39 


.12 


.05 


10^ 


2 


3 


1.27 


.23 


1.11 


.07 


.42 


.14 


.06 


1.25 


2 


3 


1.27 


.23 


1.21 


.14 


.42 


.14 


.06 


1.5 


1 


3 


(.27 


.23 


1.37 


.28 


.42 


.14 


.06 


2 


2 


3 


1.27 


.23 


1.61 


.52 


.42 


.14 


.06 


3 


2 


3 


1.27 


.23 


1.91 


.92 


.42 


.14 


.06 


5 


2 


3 


1.27 


.23 


2.31 


1.67 


,42 


.14 


.06 


10 


2 


3 


1.27 


.23 


3.50 


6.57 


.42 


.14 


,06 


100 


2 


3 


1.27 


.23 


5.02 


52.43 


.42 


.14 


.06 


10^ 


2 


3 


1.27 


.23 


5.02 


52.43 


.42 


.14 


,06 


lO* 




1 


2.34 


1.17 


1.11 


.07 


,25 


.06 


.02 


1.25 




1 


2.34 


1.17 


1.20 


.14 


.25 


.06 


.02 


1.50 




1 


2.34 


1.17 


1.32 


.26 


.25 


.06 


.02 


2 




] 


2.34 


1.17 


1.48 


.47 


.25 


.06 


.02 


3 




1 


2.34 


1.17 


1.65 


.77 


.25 


.06 


.02 


5 




1 


2.34 


1.17 


1,83 


1,26 


,25 


,06 


.02 


10 




1 


2.34 


1,17 


2.22 


3.82 


.25 


.06 


.02 


100 




1 


2.34 


1.17 


2.58 


24.18 


.25 


.06 


.02 


10* 




1 


2.34 


1.17 


2.74 


170.55 


.25 


.06 


.02 


icr- 




5 


I.IS 


.14 


1.11 


.07 


.44 


.15 


.07 


1.25 




5 


1.18 


.14 


1.21 


.14 


.44 


.15 


.07 


1.50 




5 


1.18 


.14 


1.38 


.28 


.44 


.15 


.07 


2 




5 


1.18 


.14 


1.62 


.52 


.44 


.15 


.07 


3 




5 


1.18 


.14 


1.94 


.94 


.44 


.15 


.07 


5 




5 


1.18 


.14 


2.38 


1.73 


.44 


.15 


.07 


!0 




5 


1.18 


.14 


3.77 


7.11 


.44 


.15 


,07 


KM 




5 


1.18 


.14 


5.73 


60.45 


.44 


.15 


.07 


10^ 




5 


1.18 


.14 


6.95 


4S3.26 


.44 


.15 


.07 


10* 


30 


20 


1.40 


.22 


1.11 


.07 


.38 


.11 


.04 


1.25 


30 


20 


1.40 


,22 


1.21 


.14 


.38 


.11 


.04 


1.50 


30 


20 


1.40 


.22 


1.36 


.27 


.33 


.11 


.04 


2 


30 


20 


1.40 


.22 


\.^V> 


.51 


.38 


.11 


.04 


3 


30 


20 


1.40 


.22 


1.86 


.90 


.,38 


.11 


.04 


5 


30 


20 


1.40 


.22 


2.21 


1.58 


.38 


.11 


.04 


10 


30 


20 


1.40 


.22 


3.09 


5,58 


.38 


.11 


.04 


100 


30 


20 


1.40 


.22 


3.92 


36.97 


.38 


.11 


.04 


10* 


3© 


20 


1.40 


.22 


4.28 


252.19 


.38 


.11 


.04 


10* 
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Table 4. Log-normal prior, LB = 1.001, UB = ]0,y = 
Continued 



1.19, cr=l, prior mean=1.50, SD = .61- 



DATA 




POST 


PRED 




DIST 




BOUND 


r 1 


£•(« 


) SD*(a) 


E'{X) 


SD'iX) 


2 


5 


10 


L 


20 30 


LIO 


.07 


l.U 


.07 


.47 




.08 


1.25 


20 30 


LIO 


.07 


1.21 


.14 


.47 




.08 


1.50 


20 30 


l.IO 


.07 


L38 


.28 


.47 




.08 


2 


20 30 


t.io 


.07 


L63 


.53 


.47 




.08 


3 


20 30 


1.10 


.07 


1.97 


.96 


.47 




.08 


5 


20 30 


1.10 


.07 


2.46 


1.7S 


.47 




.08 


10 


20 30 


1.10 


.07 


4.09 


7.76 


.47 




.08 


100 


20 30 


LIO 


.07 


6.75 


71.88 


.47 




,08 


10* 


20 30 


1.10 


.07 


8.61 


607.81 


.47 




.08 


10« 



A case of substantial practical importance is that 
in which the a priori information about a is weak, 
apart from the knowledge that 1 <0!^2. There is 
substantial empirical data on incomes, stock-mar- 
ket prices, city sizes, the distribution of biological 
genera and species, and many other variables, for 
which a^2. See Yule [15] and Zipf [14]. However, 
there is no known theoretical reason for taking the 
a priori distribution of a to be of the Gamma form, 
or for taking 5 > i. In the case of weak a priori in- 
formatiorr, the likelihood function is approximately 
proportional to the posterior density for a. See the 
stable estimation argument of Savage [33] and Ed- 
wards, Lindman and Savage [34]. For either classi- 
cal statisticians, to whom the a priori distribution is 
non-existent or "unknown," or to Bayesians who 
prefer to use some form of "uninformative" prior 
distribution, the results of Table 2 should be quite 
reassuring. It is possible, despite the delicacy at co 
to obtain robust answers. It may be noted in this 
table that typically the posterior predictive expecta- 
tion of Xn*i, given that it is between .T(1) and 
10xx{!), is some modest multiple of the largest ob- 
servation, at most 3 xxfij; and it is at most 5 xxji) 
when L ^ 100. This is as it should be. One docs not, 
for example, anticipate wind strengths that are 
some enormous factor times the largest yet experi- 
enced, even given that we set a new record wind 
strength. By comparing Table 1 for a = 1.50 known, 
with Table 2 for the case r = 3,/=2, we see that 
there is little sensitivity in either the predictive mo- 
ments or the predictive probabilities. For example, 
when L =5, Table 1 gives predictive moments of 

1.82 and .87, and predictive probabilities of .35, .09, 
and .03; while Table 2 gives predictive moments of 

1.83 and .88, and predictive probabilities of .36, .10, 
and .04, The greatest discrepancies occur for very 
large values of L, such as 10*, which are inappro- 
priate for most real-world applications. 



Another case of substantial interest is that in 
which a is uniform from to 2, so that even more 
extreme long-tailed behavior is possible. Again re- 
sults are not very sensitive to the choice oi a priori 
distribution, provided that L is not too large. For 
example, Table 3 with r = 3/=2,L=5, gives the 
predictive moments as 1.90 and .93, and the predic- 
tive probabilities as .42, .15, and .08. Although 
there is a real change from the results of Tables 1 
and 2, it is of limited extent, and is in the direction 
of making the predictive distribution longer-tailed, 
as was to be expected. If anything, one might be 
surprised that allowing a to get close to 0, as with 
this a priori distribution, did not move the predic- 
tive distribution much further to the right. 

The final case of great interest is where some 
definite a priori information is input, as we do here 
with tlie log-normal distribution. Table 4, for the 
case Y = V = 3,/ =2,L =5, gives 1.86 and .90 as 
predictive moments, and .39, .12, and .05, as pre- 
dictive probabilities. These results are close to 
those of Table 2, in which a has the same a priori 
expectation as in Table 4. 

The reader may compare these various tables for 
other values of the parameters, to examine the ef- 
fect of long-tailed sample data, greater sample 
sizes, cases where the a priori information is less 
concordant with the data, and the effect of L. For 
example, in Table 3 with r = 2,/ =3, so that a = .67, 
and L =5, the predictive moments are 2.07 and 
1.01, while the predictive probabilities are .56, .29, 
and .19. Again, provided that a realistic upper 
bound for L is chosen, such as 10, the changes from 
previous values are real but of h'mited magnitude, 
and in the direction to be anticipated. 

Armed with this information, let us now examine 
real-world data on city sizes. Table 5 gives the sizes 
of the 30 largest cities in the United States in 1940 
and 1988. They are first presented in descending 
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Table 5. 


City size xlO"' data 








1940 




\ms 


Ordered 


Permuted 


Ordered 


Pi;nmrtf(t 


7455 


1931 


7353 


987 


3397 


859 


3.353 


727 


1931 


302.3 


2978 


532 


1623 


305 


1698 


1647 


1504 


3397 


1647 


522 


878 


368 


1070 


599 


859 


816 


1036 


2y7K 


816 


587 


987 


465 


771 


399 


941 


1036 


672 


1623 


924 


.-^oi 


663 


456 


751 


4.34 


635 


387 


738 


511 


587 


771 


732 


umi 


576 


635 


727 


7353 


495 


492 


645 


481 


492 


301.2 


635 


732 


456 


495 


617 


941 


430 


663 


5V9 


3353 


399 


306 


578 


578 


387 


873 


570 


570 


385 


7455 


532 


IC9S 


368 


325 


521 


617 


325 


322 


511 


924 


322 


319 


502 


738 


319 


575 


492 


645 


306 


302.2 


481 


751 


305 


672 


465 


492 


302.3 


430 


439 


635 


302.2 


1504 


434 


427 


301.2 


385 


427 


43y 



order, and then in a randomly chosen pernmiaiion. 
The data for 5940 was previously analy.sed in [I J lo 
illustrate use of tlie tail-index melhod. The upper 
tail of such city size data is generally regarded as 
being modelled by ZipPs law, with some tail-index 



a. Tables 6 and 7 give the running forecasts, and 
their standard deviations, for the next observation, 
based upon the penmitation. We imagine, in other 
words, that a random sample has been taken from 
the population, and that wc successively forecast 
She magnitude of each upcoming record value. In 
this way we simidate the actual forecasting of fu- 
ture record values based upon a random sample 
from a population. It is well known that sampling 
(with or without replacement) from a finite popula- 
tion generates an e?ichangeable sequence. Because 
our forecast of the magnitude of the next record 
value depends only upon ihe upper order stalistics 
of the pasl dala, and nol directly upon how many 
past values have been observed, we pul forth the 
same expectation for ihe magnilude of the next 
record value, until we observe a new record value. 
The record values (with the first value taken as a 
record value by default) for Table 6 occurred at 
limes 1, 5, 21, and had the values 1931, 3397, 7455, 
respectively. Table 6 gives ihe 1940 forecasts for 
/^ = 3.5,10, where each forecast is based upon all 
ihe past data up to the lime of the forecast, and 
uses only the current upper two order statistics of 
the data, so r — 1. The column labelled ogives the 
current maximum -likelihood estimate of a based 
upon the two upper order .statistics, so t =;. The 
first row of Table 6 would be read as follows. Based 
upon the two largest order stalistics (1931, 859) at 
lime 2 in the 1940 permuted sequence, the esti- 
mate of ce is t.235. This data (with r=l and 
t — .810) Is used to obtain the posterior distribution 
for rr, for ii usiiform a priori distrihulion on the in- 
lerval from to 2. Forecasts and standard devia- 
tions are then presented for L =3,5,10. For 
example ihe L - 3 forecast of ihe nexl record value 



Table 6. Forecast uf 1940 oily sizes X 10"' 



Citv size 


a 




Torccast 




Forecast SD 








3 


5 


10 


3 


5 


10 


3397 


1.235 


3146 


3810 


4831 


1023 


1869 


3596 


7455 


1.770 


5500 


6621 


8295 


I7S7 


3244 


6166 


U) 


1.272 


12137 


i4fi94 


I8fi08 


394 ■( 


720') 


I.3S44 



Table 7. Forecast of 1988 city si^es x 10 ^ 



City she 


a 


Fi 


jrei^Mvt 




1 


■(.'rccil<it SD 








3 


5 
1899 


10 

2351 


3 
516 


5 

929 


to 


1647 


3.271 


15S8 


1743 


2975 


1.953 


2663 


3202 


4001 


865 


1568 


2973 


7353 


1.68S 


4824 


5810 


7290 


1569 


2847 


5420 


(?) 


1.106 


12007 


H581 


18574 


3904 


7154 


13824 
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is 3146 with a standard deviation of 1023, this fore- 
cast being made using only the previous records of 
1931 and 859. The realized value turned out to be 
3397. Note that most of the actual values are well 
within 1 standard deviation of the forecast. The 
row '?' forecasts a next record value, based upon 
all the past data, as though the population were not 
complete, and is given only for illustrative pur- 
poses. Table 7 repeats the analysis for the 1988 city 
size data. The record values occurred at trial.s 1, 4, 
7, 14, and had the values 987, 1647, 2978, 7353, 
respectively. 

This type of forecasting problem, based upon a 
random sample from a fixed population, is used to 
illustrate the procedure in connection with an ex- 
changeable sequence of observations. As shown by 
de Finetti, and discussed in [35], one can always 
represent real-world exchangeable sequences in 
terms of limits arising in sampling from a finite 
population. The exchangeable case is the simplest 
scenario in which our methods can be usefully ap- 
plied. More generally, one must deal with evolu- 
tionary processes, as for example when successive 
records are set over time. For example, if we con- 
sider the successive Olympic High Jump records, 
since 1880, we must keep in mind that we are not 
sampling from a fixed population, and that changes 
in technique and general level of physical fitness 
over time, may have a substantial effect. Similarly, 
in considering the next record value of some stock 
market index, such as the Dow Jones, there may be 
time trends that must be taken into account. How- 
ever, even in such examples as these, local ex- 
changeability over sufficiently short time periods 
may be a reasonable assumption, and appropriate 
modification of the basic forecasting procedure 
proposed in this article can be developed. 



6. Conclusions 

We believe that the above studies indicate that it 
is possible to make effective inference and predic- 
tions about record values. Our methodolrjgy can he 
used both with uniform a priori distributions, such 
as represented in Tables 2 and 3, and with more 
informative a pnOTT distributions such as in 'lable 4. 
The case that is perhaps of greatest interest for 
applications is that of the "three-parameter log-nor- 
mal distribution with threshold taken to be 1 or 0, 
as may seem appropriate. Uniform a priori distri- 
butions can, for practical purposes, be represented 
as special cases of such log-normal distributions. 



We believe that it is important to .study .sensitivity 
of results to choice of a priori distribution, as rec- 
ommended in (36,30], The choice of rand of £. can 
be implemented by Baycsian data-analytic tech- 
niques, such as described in [1,25]. Here in our 
forecast of city sizes we took r -I, but substantial 
improvements could result from a Bayesian deci- 
sion-theoretic choice of r. 

There iire some basic issues concerning the use 
of finite models, veisus infinite idealized models, 
that are especially pertinent in connection with the 
problem of prediction for long-tailed distributions, 
[f one took the conventional idealized model liter- 
ally in our example, then the analysis of Sees. 1 and 
2 demonstrates that there arc some logical diffi- 
culties, if one also views the observations as un- 
bounded. For in the case of greatest interest, 
where it is known that IS «s=. 2, the posterior first 
moment may be infinite, even though it is plainly 
unreasonable to make a prediction of more than a 
few multiples of the largest observation yet seen. 
The issue is resolved here by treating the algebraic 
model for the tail as only an approximation, valid 
in some finite domain. In this case the algebraic 
tail is compatible with both the data, and with 
putting forth sensible predictions for squared error 
loss. See [24] for discussion of the finite/infinite 
question in connection with Slcinian shrinkage es- 
timators. 

The issue regarding infinite predictive moments 
thus turns out to be largely irrelevant for forecast- 
ing, provided that one is comfortable with using 
some reasonable upper bound for the observable 
variables. Careless use of infinite models, ignoring 
the fact that realistic finite upper bounds are usu- 
ally available, might instead have led one to the 
conclusion that theor)'-based forecasting is impossi- 
ble in the case a^2. Since all statistical analyses 
must eventually be done on a computer with finite 
memory, such infinite models are at best only use- 
ful guides, and ihcir careless use can lead to nu- 
merous apparent paradoxes, which have no 
real-world importance. The primary conclusion of 
this article is that provided that a finite upper 
bound for the obsei'vations can be supplied, as is 
ordinarily the case, it is possible to make effective 
predictions of future record values. The forecasts 
that we have obtained, employing such finite upper 
bounds, are by no means perfect, but they do at 
least put one in the right ballpark, with predictions 
that are at most a few multiples of the previous 
record value. I am not aware of other methods 
available at present that do so. 
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Forecasting is always difficult, and perhaps even 
more so for the case of record values in the case of 
long-tailed distributions. Nonetheless, often such 
forecasts are important in the decision-making pro- 
cess, and must somehow or other be put forth. We 
have suggested a Bayesian melhodology which can 
make systematic use both of a priori information 
and of the current data. When used with care, we 
believe these methods can be of value in a variety 
of areas. 
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